Search CORE

460 research outputs found

Evolutionary Algorithms for Reinforcement Learning

Author: Grefenstette J. J.
Moriarty D. E.
Schultz A. C.
Publication venue: 'AI Access Foundation'
Publication date: 01/06/2011
Field of study

There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications

arXiv.org e-Print Archive

Crossref

All-sky Medium Energy Gamma-ray Observatory: Exploring the Extreme Multimessenger Universe

Author: Grefenstette Brian
McEnery Julie E.
Publication venue
Publication date: 17/07/2019
Field of study

The All-sky Medium Energy Gamma-ray Observatory (AMEGO) is a probe class mission concept that will provide essential contributions to multimessenger astrophysics in the late 2020s and beyond. AMEGO combines high sensitivity in the 200 keV to 10 GeV energy range with a wide field of view, good spectral resolution, and polarization sensitivity. Therefore, AMEGO is key in the study of multimessenger astrophysical objects that have unique signatures in the gamma-ray regime, such as neutron star mergers, supernovae, and flaring active galactic nuclei. The order-of-magnitude improvement compared to previous MeV missions also enables discoveries of a wide range of phenomena whose energy output peaks in the relatively unexplored medium-energy gamma-ray band

Caltech Authors

RTFM: Generalising to New Environment Dynamics via Reading

Author: Grefenstette E
Rocktäschel T
Zhong V
Publication venue: ICLR
Publication date: 01/01/2020
Field of study

Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2π, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2π generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2π produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps

UCL Discovery

Prioritized Level Replay

Author: Grefenstette E
Jiang M
Rocktäschel T
Publication venue: PMLR: Proceedings of Machine Learning Research
Publication date: 01/01/2021
Field of study

Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level’s future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample-efficiency and generalization on Procgen Benchmark—matching the previous state-of-the-art in test return—and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines

UCL Discovery

General intelligence requires rethinking exploration

Author: Grefenstette E
Jiang M
Rocktäschel T
Publication venue: 'The Royal Society'
Publication date: 21/06/2023
Field of study

We are at the cusp of a transition from 'learning from data' to 'learning what data to learn from' as a central focus of artificial intelligence (AI) research. While the first-order learning problem is not completely solved, large models under unified architectures, such as transformers, have shifted the learning bottleneck from how to effectively train models to how to effectively acquire and use task-relevant data. This problem, which we frame as exploration, is a universal aspect of learning in open-ended domains like the real world. Although the study of exploration in AI is largely limited to the field of reinforcement learning, we argue that exploration is essential to all learning systems, including supervised learning. We propose the problem of generalized exploration to conceptually unify exploration-driven learning between supervised learning and reinforcement learning, allowing us to highlight key similarities across learning settings and open research challenges. Importantly, generalized exploration is a necessary objective for maintaining open-ended learning processes, which in continually learning to discover and solve new problems, provides a promising path to more general intelligence

UCL Discovery

A MOS-based Dynamic Memetic Differential Evolution Algorithm for Continuous Optimization: A Scalability Test

Author: A Caponio
Antonio LaTorre
E-G Talbi
J Grefenstette
José-María Peña
N Mladenovic
Santiago Muelas
V Tirronen
Y-S Ong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Continuous optimization is one of the areas with more activity in the field of heuristic optimization. Many algorithms have been proposed and compared on several benchmarks of functions, with different performance depending on the problems. For this reason, the combination of different search strategies seems desirable to obtain the best performance of each of these approaches. This contribution explores the use of a hybrid memetic algorithm based on the multiple offspring framework. The proposed algorithm combines the explorative/exploitative strength of two heuristic search methods that separately obtain very competitive results. This algorithm has been tested with the benchmark problems and conditions defined for the special issue of the Soft Computing Journal on Scalability of Evolutionary Algorithms and other Metaheuristics for Large Scale Continuous Optimization Problems. The proposed algorithm obtained the best results compared with both its composing algorithms and a set of reference algorithms that were proposed for the special issue

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Observational Artifacts of NuSTAR: Ghost Rays and Stray Light

Author: Christensen Finn E.
Craig William W.
Forster Karl W.
Grefenstette Brian W.
Harrison Fiona A.
Madsen Kristin K.
Miyasaka Hiromasa
Rana Vikram
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2017
Field of study

The Nuclear Spectroscopic Telescope Array (NuSTAR), launched in June 2012, flies two conical approximation Wolter-I mirrors at the end of a 10.15m mast. The optics are coated with multilayers of Pt/C and W/Si that operate from 3--80 keV. Since the optical path is not shrouded, aperture stops are used to limit the field of view from background and sources outside the field of view. However, there is still a sliver of sky (~1.0--4.0 degrees) where photons may bypass the optics altogether and fall directly on the detector array. We term these photons Stray-light. Additionally, there are also photons that do not undergo the focused double reflections in the optics and we term these Ghost Rays. We present detailed analysis and characterization of these two components and discuss how they impact observations. Finally, we discuss how they could have been prevented and should be in future observatories.Comment: Published in Journal of Astronomical Telescopes, Instruments, and Systems. Open Access. http://dx.doi.org/10.1117/1.JATIS.3.4.04400

arXiv.org e-Print Archive

Caltech Authors

Online Research Database In Technology

Improving Policy Learning via Language Dynamics Distillation

Author: Grefenstette E
Mu J
Rocktäschel T
Zettlemoyer L
Zhong V
Publication venue: NeurIPS
Publication date: 01/01/2022
Field of study

Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, VAE pretraining, and methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modeling with expert demonstrations is more effective than with non-experts

UCL Discovery

Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

Author: Grefenstette E
Matthews M
Parker-Holder J
Rocktäschel T
Samvelyan M
Publication venue: Proceedings of Machine Learning Research (PMLR)
Publication date: 01/01/2022
Field of study

Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards. To this end, we created SkillHack, a benchmark of tasks and associated skills based on the game of NetHack. We evaluate a number of baselines on this benchmark, as well as our own novel skill-based method Hierarchical Kickstarting (HKS), which is shown to outperform all other evaluated methods. Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems. We ultimately argue that utilising predefined skills provides a useful inductive bias for RL problems, especially those with large state-action spaces and sparse rewards

UCL Discovery

All-sky Medium Energy Gamma-ray Observatory: Exploring the Extreme Multimessenger Universe

Author: Grefenstette Brian
McEnery Julie E.
Publication venue
Publication date: 17/07/2019
Field of study